import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import datetime
import warnings
warnings.filterwarnings("ignore")
sns.set()India Road Accident (2021-22) Data Analysis




An Introduction to the Dataset
Road Accident Dataset for India (2021-22)
This dataset provides detailed information about road accidents in India for the years 2021 and 2022. The dataset contains various attributes that has been used for exploratory data analysis (EDA) in Python to uncover patterns, trends, and insights related to road safety.This comprehensive dataset is ideal for performing exploratory data analysis to identify key factors contributing to road accidents, assess the impact of different conditions on accident severity, and develop strategies for improving road safety.
Importing necessary libraries
Read the Accident dataset
df = pd.read_excel('Road Accident India 2021-22.xlsx')df_states = pd.read_csv('State wise Accidents data.csv')Data Overview
df.head()| Accident_Index | Accident Date | Month | Year | Day_of_Week | Junction_Control | Junction_Detail | Accident_Severity | Latitude | Light_Conditions | ... | Number_of_Casualties | Number_of_Vehicles | Police_Force | Road_Surface_Conditions | Road_Type | Speed_limit | Time | Urban_or_Rural_Area | Weather_Conditions | Vehicle_Type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 200901BS70001 | 2021-01-01 | Jan | 2021 | Thursday | Give way or uncontrolled | T or staggered junction | Serious | 51.512273 | Daylight | ... | 1 | 2 | Metropolitan Police | Dry | One way street | 30 | 15:11:00 | Urban | Fine no high winds | Car |
| 1 | 200901BS70002 | 2021-01-05 | Jan | 2021 | Monday | Give way or uncontrolled | Crossroads | Serious | 51.514399 | Daylight | ... | 11 | 2 | Metropolitan Police | Wet or damp | Single carriageway | 30 | 10:59:00 | Urban | Fine no high winds | Taxi/Private hire car |
| 2 | 200901BS70003 | 2021-01-04 | Jan | 2021 | Sunday | Give way or uncontrolled | T or staggered junction | Slight | 51.486668 | Daylight | ... | 1 | 2 | Metropolitan Police | Dry | Single carriageway | 30 | 14:19:00 | Urban | Fine no high winds | Taxi/Private hire car |
| 3 | 200901BS70004 | 2021-01-05 | Jan | 2021 | Monday | Auto traffic signal | T or staggered junction | Serious | 51.507804 | Daylight | ... | 1 | 2 | Metropolitan Police | Frost or ice | Single carriageway | 30 | 08:10:00 | Urban | Other | Motorcycle over 500cc |
| 4 | 200901BS70005 | 2021-01-06 | Jan | 2021 | Tuesday | Auto traffic signal | Crossroads | Serious | 51.482076 | Darkness - lights lit | ... | 1 | 2 | Metropolitan Police | Dry | Single carriageway | 30 | 17:25:00 | Urban | Fine no high winds | Car |
5 rows × 23 columns
df_states| _id | State/UT/City | Dangerous or Careless Driving/ Overtaking etc Cases | Dangerous or Careless Driving/ Overtaking etc Injured | Dangerous or Careless Driving/ Overtaking etc Died | Overspeeding Cases | Overspeeding Injured | Overspeeding Died | Driving under Influence of Drug/Alcohol Cases | Driving under Influence of Drug/Alcohol Injured | ... | Vehicles Parking at Road Shoulders Died | Causes Not Known Cases | Causes Not Known Injured | Causes Not Known Died | Other Causes Cases | Other Causes Injured | Other Causes Died | Total Road Accidents Cases | Total Road Accidents Injured | Total Road Accidents Died | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | ANDHRA PRADESH | 2185 | 2271 | 755 | 16631 | 16188 | 6371 | 119 | 64 | ... | 18.0 | 121.0 | 119.0 | 32.0 | 2129.0 | 1957.0 | 817.0 | 21556.0 | 21040.0 | 8186.0 |
| 1 | 2 | ARUNACHAL PRADESH | 65 | 59 | 40 | 120 | 127 | 74 | 3 | 6 | ... | 0.0 | 9.0 | 4.0 | 7.0 | 38.0 | 37.0 | 28.0 | 261.0 | 266.0 | 173.0 |
| 2 | 3 | ASSAM | 886 | 833 | 347 | 4303 | 3237 | 1946 | 288 | 201 | ... | 45.0 | 42.0 | 0.0 | 10.0 | 89.0 | 95.0 | 21.0 | 7069.0 | 5420.0 | 3014.0 |
| 3 | 4 | BIHAR | 5039 | 4134 | 4071 | 2886 | 2348 | 2284 | 51 | 53 | ... | 95.0 | 20.0 | 12.0 | 22.0 | 101.0 | 70.0 | 77.0 | 9553.0 | 7946.0 | 7660.0 |
| 4 | 5 | CHHATTISGARH | 3536 | 3258 | 1750 | 6378 | 5603 | 2723 | 145 | 159 | ... | 71.0 | 455.0 | 220.0 | 258.0 | 1163.0 | 917.0 | 445.0 | 12395.0 | 10682.0 | 5413.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 88 | 89 | VARANASI | 41 | 26 | 60 | 36 | 14 | 35 | 9 | 3 | ... | 3.0 | 25.0 | 26.0 | 34.0 | 0.0 | 0.0 | 0.0 | 133.0 | 86.0 | 145.0 |
| 89 | 90 | VASAI VIRAR | 70 | 62 | 22 | 276 | 191 | 125 | 0 | 0 | ... | 0.0 | 1.0 | 0.0 | 1.0 | 5.0 | 3.0 | 2.0 | 352.0 | 256.0 | 150.0 |
| 90 | 91 | VIJAYAWADA | 124 | 129 | 20 | 1101 | 952 | 267 | 3 | 0 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1228.0 | 1081.0 | 287.0 |
| 91 | 92 | VISHAKHAPATNAM | 31 | 26 | 6 | 1785 | 1166 | 261 | 41 | 7 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 460.0 | 313.0 | 95.0 | 2339.0 | 1533.0 | 368.0 |
| 92 | 93 | TOTAL (CITIES) | 14335 | 12062 | 3885 | 31753 | 27448 | 7415 | 1137 | 843 | ... | 77.0 | 1441.0 | 1228.0 | 328.0 | 4578.0 | 4344.0 | 778.0 | 55442.0 | 47523.0 | 13384.0 |
93 rows × 44 columns
#Dimensions
df.shape(307973, 23)
df_states.shape(93, 44)
df.info()<class 'pandas.core.frame.DataFrame'>
RangeIndex: 307973 entries, 0 to 307972
Data columns (total 23 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Accident_Index 307973 non-null object
1 Accident Date 307973 non-null datetime64[ns]
2 Month 307973 non-null object
3 Year 307973 non-null int64
4 Day_of_Week 307973 non-null object
5 Junction_Control 307973 non-null object
6 Junction_Detail 307973 non-null object
7 Accident_Severity 307973 non-null object
8 Latitude 307973 non-null float64
9 Light_Conditions 307973 non-null object
10 Local_Authority_(District) 307973 non-null object
11 Carriageway_Hazards 5424 non-null object
12 Longitude 307973 non-null float64
13 Number_of_Casualties 307973 non-null int64
14 Number_of_Vehicles 307973 non-null int64
15 Police_Force 307973 non-null object
16 Road_Surface_Conditions 307656 non-null object
17 Road_Type 306439 non-null object
18 Speed_limit 307973 non-null int64
19 Time 307956 non-null object
20 Urban_or_Rural_Area 307973 non-null object
21 Weather_Conditions 301916 non-null object
22 Vehicle_Type 307973 non-null object
dtypes: datetime64[ns](1), float64(2), int64(4), object(16)
memory usage: 54.0+ MB
df_states.info()<class 'pandas.core.frame.DataFrame'>
RangeIndex: 93 entries, 0 to 92
Data columns (total 44 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 _id 93 non-null int64
1 State/UT/City 93 non-null object
2 Dangerous or Careless Driving/ Overtaking etc Cases 93 non-null int64
3 Dangerous or Careless Driving/ Overtaking etc Injured 93 non-null int64
4 Dangerous or Careless Driving/ Overtaking etc Died 93 non-null int64
5 Overspeeding Cases 93 non-null int64
6 Overspeeding Injured 93 non-null int64
7 Overspeeding Died 93 non-null int64
8 Driving under Influence of Drug/Alcohol Cases 93 non-null int64
9 Driving under Influence of Drug/Alcohol Injured 93 non-null int64
10 Driving under Influence of Drug/Alcohol Died 93 non-null int64
11 Physical Fatigue of Drivers Cases 93 non-null int64
12 Physical Fatigue of Drivers Injured 93 non-null int64
13 Physical Fatigue of Drivers Died 93 non-null int64
14 Defect in Mechanical Condition of Vehicle Cases 93 non-null int64
15 Defect in Mechanical Condition of Vehicle Injured 93 non-null int64
16 Defect in Mechanical Condition of Vehicle Died 93 non-null int64
17 Animal Crossing Cases 93 non-null int64
18 Animal Crossing Injured 93 non-null int64
19 Animal Crossing Died 93 non-null int64
20 Weather Condition (Total) Cases 93 non-null int64
21 Weather Condition (Total) Injured 93 non-null int64
22 Weather Condition (Total) Died 93 non-null int64
23 Weather Condition (Poor Visibility) Cases 93 non-null int64
24 Weather Condition (Poor Visibility) Injured 93 non-null int64
25 Weather Condition (Poor Visibility) Died 93 non-null int64
26 Weather Condition (Other Causes) Cases 93 non-null int64
27 Weather Condition (Other Causes) Injured 93 non-null int64
28 Weather Condition (Other Causes) Died 93 non-null int64
29 Lack of Road Infrastructure Cases 92 non-null float64
30 Lack of Road Infrastructure Injured 92 non-null float64
31 Lack of Road Infrastructure Died 92 non-null float64
32 Vehicles Parking at Road Shoulders Cases 92 non-null float64
33 Vehicles Parking at Road Shoulders Injured 92 non-null float64
34 Vehicles Parking at Road Shoulders Died 92 non-null float64
35 Causes Not Known Cases 92 non-null float64
36 Causes Not Known Injured 92 non-null float64
37 Causes Not Known Died 92 non-null float64
38 Other Causes Cases 92 non-null float64
39 Other Causes Injured 92 non-null float64
40 Other Causes Died 92 non-null float64
41 Total Road Accidents Cases 92 non-null float64
42 Total Road Accidents Injured 92 non-null float64
43 Total Road Accidents Died 92 non-null float64
dtypes: float64(15), int64(28), object(1)
memory usage: 32.1+ KB
#Check Null Values
df.isnull().sum()Accident_Index 0
Accident Date 0
Month 0
Year 0
Day_of_Week 0
Junction_Control 0
Junction_Detail 0
Accident_Severity 0
Latitude 0
Light_Conditions 0
Local_Authority_(District) 0
Carriageway_Hazards 302549
Longitude 0
Number_of_Casualties 0
Number_of_Vehicles 0
Police_Force 0
Road_Surface_Conditions 317
Road_Type 1534
Speed_limit 0
Time 17
Urban_or_Rural_Area 0
Weather_Conditions 6057
Vehicle_Type 0
dtype: int64
df_states.isnull().sum()_id 0
State/UT/City 0
Dangerous or Careless Driving/ Overtaking etc Cases 0
Dangerous or Careless Driving/ Overtaking etc Injured 0
Dangerous or Careless Driving/ Overtaking etc Died 0
Overspeeding Cases 0
Overspeeding Injured 0
Overspeeding Died 0
Driving under Influence of Drug/Alcohol Cases 0
Driving under Influence of Drug/Alcohol Injured 0
Driving under Influence of Drug/Alcohol Died 0
Physical Fatigue of Drivers Cases 0
Physical Fatigue of Drivers Injured 0
Physical Fatigue of Drivers Died 0
Defect in Mechanical Condition of Vehicle Cases 0
Defect in Mechanical Condition of Vehicle Injured 0
Defect in Mechanical Condition of Vehicle Died 0
Animal Crossing Cases 0
Animal Crossing Injured 0
Animal Crossing Died 0
Weather Condition (Total) Cases 0
Weather Condition (Total) Injured 0
Weather Condition (Total) Died 0
Weather Condition (Poor Visibility) Cases 0
Weather Condition (Poor Visibility) Injured 0
Weather Condition (Poor Visibility) Died 0
Weather Condition (Other Causes) Cases 0
Weather Condition (Other Causes) Injured 0
Weather Condition (Other Causes) Died 0
Lack of Road Infrastructure Cases 1
Lack of Road Infrastructure Injured 1
Lack of Road Infrastructure Died 1
Vehicles Parking at Road Shoulders Cases 1
Vehicles Parking at Road Shoulders Injured 1
Vehicles Parking at Road Shoulders Died 1
Causes Not Known Cases 1
Causes Not Known Injured 1
Causes Not Known Died 1
Other Causes Cases 1
Other Causes Injured 1
Other Causes Died 1
Total Road Accidents Cases 1
Total Road Accidents Injured 1
Total Road Accidents Died 1
dtype: int64
# Dropping 'Carriageway_Hazards' column
df.drop('Carriageway_Hazards', axis = 1, inplace = True)Data Cleaning
Renaming Columns
#Renaming Columns
# Renaming columns
df_states.rename(columns={
'_id': 'id',
'State/UT/City': 'state_city',
'Dangerous or Careless Driving/ Overtaking etc Cases': 'dangerous_driving_cases',
'Dangerous or Careless Driving/ Overtaking etc Injured': 'dangerous_driving_injured',
'Dangerous or Careless Driving/ Overtaking etc Died': 'dangerous_driving_died',
'Overspeeding Cases': 'overspeeding_cases',
'Overspeeding Injured': 'overspeeding_injured',
'Overspeeding Died': 'overspeeding_died',
'Driving under Influence of Drug/Alcohol Cases': 'drunk_driving_cases',
'Driving under Influence of Drug/Alcohol Injured': 'drunk_driving_injured',
'Driving under Influence of Drug/Alcohol Died': 'drunk_driving_died',
'Physical Fatigue of Drivers Cases': 'fatigue_cases',
'Physical Fatigue of Drivers Injured': 'fatigue_injured',
'Physical Fatigue of Drivers Died': 'fatigue_died',
'Defect in Mechanical Condition of Vehicle Cases': 'mechanical_defect_cases',
'Defect in Mechanical Condition of Vehicle Injured': 'mechanical_defect_injured',
'Defect in Mechanical Condition of Vehicle Died': 'mechanical_defect_died',
'Animal Crossing Cases': 'animal_crossing_cases',
'Animal Crossing Injured': 'animal_crossing_injured',
'Animal Crossing Died': 'animal_crossing_died',
'Weather Condition (Total) Cases': 'weather_total_cases',
'Weather Condition (Total) Injured': 'weather_total_injured',
'Weather Condition (Total) Died': 'weather_total_died',
'Weather Condition (Poor Visibility) Cases': 'poor_visibility_cases',
'Weather Condition (Poor Visibility) Injured': 'poor_visibility_injured',
'Weather Condition (Poor Visibility) Died': 'poor_visibility_died',
'Weather Condition (Other Causes) Cases': 'weather_other_cases',
'Weather Condition (Other Causes) Injured': 'weather_other_injured',
'Weather Condition (Other Causes) Died': 'weather_other_died',
'Lack of Road Infrastructure Cases': 'road_infrastructure_cases',
'Lack of Road Infrastructure Injured': 'road_infrastructure_injured',
'Lack of Road Infrastructure Died': 'road_infrastructure_died',
'Vehicles Parking at Road Shoulders Cases': 'parking_shoulder_cases',
'Vehicles Parking at Road Shoulders Injured': 'parking_shoulder_injured',
'Vehicles Parking at Road Shoulders Died': 'parking_shoulder_died',
'Causes Not Known Cases': 'unknown_causes_cases',
'Causes Not Known Injured': 'unknown_causes_injured',
'Causes Not Known Died': 'unknown_causes_died',
'Other Causes Cases': 'other_causes_cases',
'Other Causes Injured': 'other_causes_injured',
'Other Causes Died': 'other_causes_died',
'Total Road Accidents Cases': 'total_accidents_cases',
'Total Road Accidents Injured': 'total_accidents_injured',
'Total Road Accidents Died': 'total_accidents_died'
}, inplace=True)df_states.head()| id | state_city | dangerous_driving_cases | dangerous_driving_injured | dangerous_driving_died | overspeeding_cases | overspeeding_injured | overspeeding_died | drunk_driving_cases | drunk_driving_injured | ... | parking_shoulder_died | unknown_causes_cases | unknown_causes_injured | unknown_causes_died | other_causes_cases | other_causes_injured | other_causes_died | total_accidents_cases | total_accidents_injured | total_accidents_died | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | ANDHRA PRADESH | 2185 | 2271 | 755 | 16631 | 16188 | 6371 | 119 | 64 | ... | 18.0 | 121.0 | 119.0 | 32.0 | 2129.0 | 1957.0 | 817.0 | 21556.0 | 21040.0 | 8186.0 |
| 1 | 2 | ARUNACHAL PRADESH | 65 | 59 | 40 | 120 | 127 | 74 | 3 | 6 | ... | 0.0 | 9.0 | 4.0 | 7.0 | 38.0 | 37.0 | 28.0 | 261.0 | 266.0 | 173.0 |
| 2 | 3 | ASSAM | 886 | 833 | 347 | 4303 | 3237 | 1946 | 288 | 201 | ... | 45.0 | 42.0 | 0.0 | 10.0 | 89.0 | 95.0 | 21.0 | 7069.0 | 5420.0 | 3014.0 |
| 3 | 4 | BIHAR | 5039 | 4134 | 4071 | 2886 | 2348 | 2284 | 51 | 53 | ... | 95.0 | 20.0 | 12.0 | 22.0 | 101.0 | 70.0 | 77.0 | 9553.0 | 7946.0 | 7660.0 |
| 4 | 5 | CHHATTISGARH | 3536 | 3258 | 1750 | 6378 | 5603 | 2723 | 145 | 159 | ... | 71.0 | 455.0 | 220.0 | 258.0 | 1163.0 | 917.0 | 445.0 | 12395.0 | 10682.0 | 5413.0 |
5 rows × 44 columns
# Imputing numerical columns with mean
numerical_columns = df_states.select_dtypes(include=['float64'])
numerical_columns = numerical_columns.columns[numerical_columns.isnull().any()]
df_states[numerical_columns] = df_states[numerical_columns].fillna(df_states[numerical_columns].mean())
missing_counts = df_states.isnull().sum()
print(missing_counts)id 0
state_city 0
dangerous_driving_cases 0
dangerous_driving_injured 0
dangerous_driving_died 0
overspeeding_cases 0
overspeeding_injured 0
overspeeding_died 0
drunk_driving_cases 0
drunk_driving_injured 0
drunk_driving_died 0
fatigue_cases 0
fatigue_injured 0
fatigue_died 0
mechanical_defect_cases 0
mechanical_defect_injured 0
mechanical_defect_died 0
animal_crossing_cases 0
animal_crossing_injured 0
animal_crossing_died 0
weather_total_cases 0
weather_total_injured 0
weather_total_died 0
poor_visibility_cases 0
poor_visibility_injured 0
poor_visibility_died 0
weather_other_cases 0
weather_other_injured 0
weather_other_died 0
road_infrastructure_cases 0
road_infrastructure_injured 0
road_infrastructure_died 0
parking_shoulder_cases 0
parking_shoulder_injured 0
parking_shoulder_died 0
unknown_causes_cases 0
unknown_causes_injured 0
unknown_causes_died 0
other_causes_cases 0
other_causes_injured 0
other_causes_died 0
total_accidents_cases 0
total_accidents_injured 0
total_accidents_died 0
dtype: int64
Handling Missing Values
# Imputing categorical columns with mode
categorical_columns = df.select_dtypes(include=['object'])
categorical_columns = categorical_columns.columns[categorical_columns.isnull().any()]
df[categorical_columns] = df[categorical_columns].fillna(df[categorical_columns].mode().iloc[0])
# Verifying if all missing values are handled
missing_counts = df.isnull().sum()
print(missing_counts)Accident_Index 0
Accident Date 0
Month 0
Year 0
Day_of_Week 0
Junction_Control 0
Junction_Detail 0
Accident_Severity 0
Latitude 0
Light_Conditions 0
Local_Authority_(District) 0
Longitude 0
Number_of_Casualties 0
Number_of_Vehicles 0
Police_Force 0
Road_Surface_Conditions 0
Road_Type 0
Speed_limit 0
Time 0
Urban_or_Rural_Area 0
Weather_Conditions 0
Vehicle_Type 0
dtype: int64
#Statistical Overview
df_num = df.select_dtypes(include = ['int', 'float'])
df_num.describe()| Year | Latitude | Longitude | Number_of_Casualties | Number_of_Vehicles | Speed_limit | |
|---|---|---|---|---|---|---|
| count | 307973.000000 | 307973.000000 | 307973.000000 | 307973.000000 | 307973.000000 | 307973.000000 |
| mean | 2021.468934 | 52.487005 | -1.368884 | 1.356882 | 1.829063 | 38.866037 |
| std | 0.499035 | 1.339011 | 1.356092 | 0.815857 | 0.710477 | 14.032933 |
| min | 2021.000000 | 49.914488 | -7.516225 | 1.000000 | 1.000000 | 10.000000 |
| 25% | 2021.000000 | 51.485248 | -2.247937 | 1.000000 | 1.000000 | 30.000000 |
| 50% | 2021.000000 | 52.225943 | -1.349258 | 1.000000 | 2.000000 | 30.000000 |
| 75% | 2022.000000 | 53.415517 | -0.206810 | 1.000000 | 2.000000 | 50.000000 |
| max | 2022.000000 | 60.598055 | 1.759398 | 48.000000 | 32.000000 | 70.000000 |
Now the Data is clean !
Highlights of the Report
Demographic Impact:
Young adults in the age group of 18 - 45 years accounted for 66.5% of the victims in 2022. Additionally, people in the working age group of 18 – 60 years constituted 83.4% of the total road accident fatalities.
Road-User Categories:
Among road-user categories, two-wheeler riders had the highest share in total fatalities, representing 44.5% of persons killed in road accidents in 2022. Pedestrian road-users were the second-largest group, with 19.5% of fatalities.
International Comparison:
India has the highest number of total persons killed due to road accidents, followed by China and the United States. Venezuela has the highest rate of persons killed per 1,00,000 population.
📌 Total Casualties took place after the accident
Number of Road Accidents:
In 2022, a total of 4,17,883 road accidents occurred in India, leading to 1,68,491 fatalities and 4,43,366 people injured. These figures represent an 11.9% year-on-year increase in accidents, a 9.4% rise in fatalities, and a substantial 15.3% surge in the number of people injured compared to the previous year.
print('Total Casualties took place after the accident is : ',df['Number_of_Casualties'].sum())Total Casualties took place after the accident is : 417883
📌 Total Casualties & percentage of total with respect to accident severity and maximum casualties by type of vehicle

Road Accident Distribution:
32.9% of accidents took place on National Highways and Expressways, 23.1% on State Highways, and the remaining 43.9% on other roads. 36.2% of fatalities occurred on National Highways, 24.3% on State Highways, and 39.4% on other roads.
df1 = df['Accident_Severity'].value_counts()
df1Accident_Severity
Slight 263280
Serious 40740
Fatal 3953
Name: count, dtype: int64
sns.barplot(x = df1.index, y = df1.values, palette = 'bright')
plt.title('Total Casualties respect to accident severity')
plt.xlabel('Severity')
plt.ylabel('Total Casualty')
plt.show()
df2 = df['Accident_Severity'].value_counts(normalize = True)
df2Accident_Severity
Slight 0.854880
Serious 0.132284
Fatal 0.012836
Name: proportion, dtype: float64
plt.figure(figsize=(6,6))
colors = sns.color_palette('bright')
df2.plot(kind = 'pie', autopct = '%2.2f%%', colors = colors, startangle = 70)
plt.ylabel(None)
plt.show()
📌 Total Casualties with respect to vehicle type
Vehicle Categories:
Two-wheelers, for the second consecutive year, accounted for the highest share in both total accidents and fatalities in 2022. Light vehicles, including cars, jeeps, and taxis, ranked a distant second.
df3 = df['Vehicle_Type'].value_counts().sort_values(ascending = False)
df3Vehicle_Type
Car 239794
Van / Goods 3.5 tonnes mgw or under 15695
Motorcycle over 500cc 11226
Bus or coach (17 or more pass seats) 8686
Motorcycle 125cc and under 6852
Goods 7.5 tonnes mgw and over 6532
Taxi/Private hire car 5543
Motorcycle 50cc and under 3703
Motorcycle over 125cc and up to 500cc 3285
Other vehicle 2516
Goods over 3.5t. and under 7.5t 2502
Minibus (8 - 16 passenger seats) 821
Agricultural vehicle 749
Pedal cycle 66
Ridden horse 3
Name: count, dtype: int64
So, Car constitutes highest no of accidents -> 239794
Excluding Car
df_nocar = df[df['Vehicle_Type'] != 'Car']
df_11 = df_nocar['Vehicle_Type'].value_counts().sort_values(ascending = False).head(8)sns.barplot(x = df_11.values, y = df_11.index, palette = 'viridis')
plt.title('Total Casualties with respect to vehicle type')
plt.show()
df4 = df[df['Year']== 2021]['Month'].value_counts()df5 = df[df['Year']== 2022]['Month'].value_counts()plt.figure(figsize=(10, 6))
plt.plot(df4.index, df4.values, marker='o', label='2021')
plt.plot(df5.index, df5.values, marker='o', label='2022')
plt.xlabel('Month')
plt.ylabel('Count')
plt.title('Monthly Accidents Comparison for 2021 and 2022')
plt.legend()
plt.grid(True)
plt.show()
📌 Maximum Casualties by Road type
df6 = df['Road_Type'].value_counts()
df6Road_Type
Single carriageway 230612
Dual carriageway 45467
Roundabout 20929
One way street 6197
Slip road 3234
Name: count, dtype: int64
plt.figure(figsize=(10,6))
sns.barplot(x = df6.index, y = df6.values, palette = 'magma')
plt.xlabel('Road Type')
plt.ylabel('Casualties')
plt.title('Casualties by Road type')
plt.show()
📌 Relation between Casualties by Area/ Location & by Day/ Night
grouped_data = df.groupby(['Urban_or_Rural_Area','Light_Conditions'])['Number_of_Casualties'].sum().reset_index()
grouped_data| Urban_or_Rural_Area | Light_Conditions | Number_of_Casualties | |
|---|---|---|---|
| 0 | Rural | Darkness - lighting unknown | 1620 |
| 1 | Rural | Darkness - lights lit | 16724 |
| 2 | Rural | Darkness - lights unlit | 658 |
| 3 | Rural | Darkness - no lighting | 24215 |
| 4 | Rural | Daylight | 118802 |
| 5 | Urban | Darkness - lighting unknown | 2209 |
| 6 | Urban | Darkness - lights lit | 65443 |
| 7 | Urban | Darkness - lights unlit | 880 |
| 8 | Urban | Darkness - no lighting | 1171 |
| 9 | Urban | Daylight | 186161 |
pivot_table = grouped_data.pivot_table(index='Urban_or_Rural_Area', columns='Light_Conditions', values='Number_of_Casualties', fill_value=0)pivot_table| Light_Conditions | Darkness - lighting unknown | Darkness - lights lit | Darkness - lights unlit | Darkness - no lighting | Daylight |
|---|---|---|---|---|---|
| Urban_or_Rural_Area | |||||
| Rural | 1620.0 | 16724.0 | 658.0 | 24215.0 | 118802.0 |
| Urban | 2209.0 | 65443.0 | 880.0 | 1171.0 | 186161.0 |
plt.figure(figsize=(12,6))
sns.barplot(x = 'Urban_or_Rural_Area' , y = 'Number_of_Casualties', hue = 'Light_Conditions', data = grouped_data, palette = 'viridis' )
plt.xlabel('Region')
plt.ylabel('Casualties')
plt.title('Casualties by Area/Light Conditions')
plt.show()
📌 Speed Limit vs No of casualties
plt.figure(figsize=(12,6))
sns.lineplot(x = 'Speed_limit', y = 'Number_of_Casualties', data = df)
plt.xlabel('Spped Limit')
plt.ylabel('Casualties')
plt.title('Casualties by Speed limit')
plt.grid(True)
plt.show()
📌 No of Vehicles vs No of Casualties
plt.figure(figsize=(12,6))
sns.lineplot(x = 'Number_of_Vehicles', y = 'Number_of_Casualties', data = df)
plt.xlabel('No of vehicles')
plt.ylabel('Casualties')
plt.title('Casualties by No of vehicles')
plt.grid(True)
plt.show()
📌 Casualties by Day of the week
df7 = df['Day_of_Week'].value_counts()
plt.figure(figsize=(12, 5))
sns.barplot(x = df7.index, y = df7.values, palette = 'gnuplot_d')
plt.xlabel('Day of the Week')
plt.ylabel('Number of Accidents')
plt.title('Number of Accidents by Day of the Week')
plt.grid(True)
plt.show()
📌 URBAN VS RURAL Casualties
Rural vs. Urban Accidents:
About 68% of road accident deaths occurred in rural areas, with urban areas contributing 32% to the total accident deaths in the country.
df['Urban_or_Rural_Area'].value_counts()Urban_or_Rural_Area
Urban 198532
Rural 109441
Name: count, dtype: int64
# Group by Urban/Rural Area and Accident Severity
severity_counts = df.groupby(['Urban_or_Rural_Area', 'Road_Surface_Conditions']).size().unstack()
# Plot stacked column chart
severity_counts.plot(kind='bar', stacked=True, figsize=(8, 6))
plt.xlabel('Urban or Rural Area')
plt.ylabel('Number of Accidents')
plt.title('Distribution of Accidents by Road Surface Conditions and Urban/Rural Area')
plt.xticks(rotation=0)
plt.legend(title='Road Surface Conditions')
plt.show()
State-Specific Data:
Tamil Nadu recorded the highest number of road accidents in 2022, with 13.9% of the total accidents, followed by Madhya Pradesh with 11.8%. Uttar Pradesh had the highest number of fatalities due to road accidents (13.4%), followed by Tamil Nadu (10.6%). Understanding state-specific trends is essential for targeted interventions.
📌 Total Accidensts, Injured and casualties on States UT
df8 = df_states[(df_states['state_city'] != 'TOTAL (STATES)') & (df_states['state_city'] !='TOTAL (CITIES)')]
df_cases = df8.groupby('state_city')['total_accidents_cases'].sum().sort_values(ascending = False).head(10)plt.figure(figsize=(12, 5))
sns.barplot(x = df_cases.index, y =df_cases.values, palette = 'magma')
plt.xlabel('Statea and UT')
plt.ylabel('Total Accident Cases')
plt.title('Number of Accidents by States')
plt.xticks(rotation = 90)
plt.grid(True)
plt.show()
df9 = df_states[(df_states['state_city'] != 'TOTAL (STATES)') & (df_states['state_city'] !='TOTAL (CITIES)')]
df_died = df9.groupby('state_city')['total_accidents_died'].sum().sort_values(ascending = False).head(10)sns.barplot(x = df_died.values, y =df_died.index, palette = 'coolwarm')
plt.xlabel('Statea and UT')
plt.ylabel('Total Deaths')
plt.title('Number of Deaths by States')
plt.xticks(rotation = 90)
plt.grid(True)
plt.show()
plt.show()
📌 States/UTs-wise Total Number of Persons Killed in Road Accidents on State Highways from 2018 to 2021
import json
import plotly.express as pximport plotly.io as pio
pio.renderers.default = "notebook"india_states = json.load(open("states_india.geojson", "r"))
state_id_map = {}
for feature in india_states["features"]:
feature["id"] = feature["properties"]["state_code"]
state_id_map[feature["properties"]["st_nm"]] = feature["id"]df_s = pd.read_excel("state wise accident deaths.xlsx")
df_s["id"] = df_s["States/UTs"].apply(lambda x: state_id_map[x])india_states['features'][1]['properties']{'cartodb_id': 2, 'state_code': 35, 'st_nm': 'Andaman & Nicobar Island'}
df_s.head()| States/UTs | Killed in Highway accident 2018 | id | |
|---|---|---|---|
| 0 | Andhra Pradesh | 1897 | 28 |
| 1 | Arunanchal Pradesh | 58 | 12 |
| 2 | Assam | 681 | 18 |
| 3 | Bihar | 1474 | 10 |
| 4 | Chhattisgarh | 1068 | 22 |
fig = px.choropleth_mapbox(
df_s,
locations="id",
geojson=india_states,
color="Killed in Highway accident 2018",
hover_name="States/UTs",
hover_data=["Killed in Highway accident 2018"],
title="States/UTs-wise Total Number of Persons Killed in Road Accidents on State Highways from 2018 to 2021",
mapbox_style="carto-positron",
center={"lat": 24, "lon": 78},
zoom=3,
opacity=0.5
)
fig.update_geos(fitbounds="locations", visible=False)
fig.show()📌Worldwide Accidental Deaths(1990-2019)
import plotly.offline as pydf_nation = pd.read_csv('country_accidents.csv')df_nation.head()| Entity | Code | Year | Deaths | Sidedness | Historical_Population | |
|---|---|---|---|---|---|---|
| 0 | Afghanistan | AFG | 1990 | 4154 | 0 | 12412311.0 |
| 1 | Afghanistan | AFG | 1991 | 4472 | 0 | 13299016.0 |
| 2 | Afghanistan | AFG | 1992 | 5106 | 0 | 14485543.0 |
| 3 | Afghanistan | AFG | 1993 | 5681 | 0 | 15816601.0 |
| 4 | Afghanistan | AFG | 1994 | 6001 | 0 | 17075728.0 |
Country wise total deaths
df_nation.groupby('Entity')['Deaths'].sum().sort_values(ascending = False).reset_index().head(10)| Entity | Deaths | |
|---|---|---|
| 0 | World | 36317087 |
| 1 | G20 | 23328740 |
| 2 | Asia | 21670793 |
| 3 | World Bank Upper Middle Income | 16041327 |
| 4 | Middle SDI | 13623644 |
| 5 | East Asia & Pacific - World Bank region | 13035092 |
| 6 | World Bank Lower Middle Income | 12599627 |
| 7 | Southeast Asia, East Asia, and Oceania | 12411258 |
| 8 | Western Pacific Region | 10454671 |
| 9 | Commonwealth | 8831208 |
df_nation['Entity'].nunique()267
countries = np.unique(df_nation['Entity'])data = [ dict(
type = 'choropleth',
locations = countries,
z = df_nation['Deaths'],
locationmode = 'country names',
text = countries,
marker = dict(
line = dict(color = 'rgb(0,0,0)', width = 1)),
colorbar = dict(autotick = True, tickprefix = '',
title = 'Total Deaths by Road Accident')
)
]
layout = dict(
title = 'Worldwide total number of deaths from road traffic incidents from 1990 to 2019',
geo = dict(
showframe = False,
showocean = True,
oceancolor = 'rgb(0,255,255)',
projection = dict(
type = 'orthographic',
rotation = dict(
lon = 60,
lat = 10),
),
lonaxis = dict(
showgrid = True,
gridcolor = 'rgb(102, 102, 102)'
),
lataxis = dict(
showgrid = True,
gridcolor = 'rgb(102, 102, 102)'
)
),
)
fig = dict(data=data, layout=layout)
py.iplot(fig, validate=False, filename='worldmap')National and State-wise Comparison for India
National Trends:
At the national level, road accidents and fatalities may vary significantly due to differences in road infrastructure, traffic regulations, and enforcement practices. The national average provides a baseline for comparing individual states’ performance. Deaths and Accident Cases: The comparison of deaths and accident cases on a national level highlights regions with higher accident rates and fatalities. This can help prioritize areas for safety interventions and resource allocation. State-wise Insights:
High-Risk States:
States with higher numbers of road accidents and fatalities require targeted interventions, such as improved road safety measures, better enforcement of traffic laws, and public awareness campaigns. Effective States: States with lower accident rates and fatalities may have effective road safety policies and practices that could be modeled in other regions. Regional Differences: Differences in road infrastructure, urbanization levels, and traffic density among states can influence accident rates. Tailored approaches are necessary to address state-specific challenges.
📌 Descriptive Statistics
Correlation between variables
from sklearn.preprocessing import LabelEncoder
# Exclude unnecessary columns
columns_to_keep = ['Month', 'Junction_Control', 'Junction_Detail', 'Accident_Severity', 'Latitude',
'Light_Conditions', 'Local_Authority_(District)', 'Carriageway_Hazards', 'Longitude',
'Number_of_Casualties', 'Number_of_Vehicles', 'Police_Force', 'Road_Surface_Conditions',
'Road_Type', 'Speed_limit', 'Urban_or_Rural_Area', 'Weather_Conditions', 'Vehicle_Type']
df_corr = df[columns_to_keep]
# Initialize LabelEncoder
label_encoder = LabelEncoder()
# Iterate through each column
for col in df_corr.columns:
# Check if the column is categorical
if df_corr[col].dtype == 'object':
# Apply LabelEncoder to perform categorical to numerical transformation
df_corr[col] = label_encoder.fit_transform(df_corr[col].astype(str))
# Create a correlation matrix
corr_matrix = df_corr.corr()
corr_matrix| Month | Junction_Control | Junction_Detail | Accident_Severity | Latitude | Light_Conditions | Local_Authority_(District) | Carriageway_Hazards | Longitude | Number_of_Casualties | Number_of_Vehicles | Police_Force | Road_Surface_Conditions | Road_Type | Speed_limit | Urban_or_Rural_Area | Weather_Conditions | Vehicle_Type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Month | 1.000000 | 0.003259 | 0.010447 | -0.000800 | -0.007685 | 0.000165 | -0.000996 | 0.000569 | 0.009110 | -0.009141 | 0.006287 | -0.002062 | -0.032542 | 0.005644 | -0.020367 | 0.019713 | -0.021083 | -0.000137 |
| Junction_Control | 0.003259 | 1.000000 | 0.288568 | -0.000400 | -0.070671 | 0.042531 | -0.001454 | 0.008143 | -0.033068 | 0.000306 | 0.034596 | -0.105783 | 0.008309 | 0.101385 | 0.025817 | -0.071934 | -0.005122 | -0.006543 |
| Junction_Detail | 0.010447 | 0.288568 | 1.000000 | 0.029895 | -0.027699 | 0.005400 | 0.000790 | 0.036481 | 0.045583 | -0.042448 | 0.025797 | 0.006225 | -0.028606 | 0.093675 | -0.135112 | 0.104192 | -0.014495 | 0.002393 |
| Accident_Severity | -0.000800 | -0.000400 | 0.029895 | 1.000000 | -0.019656 | 0.025237 | -0.005521 | 0.003894 | 0.000240 | -0.075685 | 0.077072 | 0.003962 | 0.015108 | -0.014130 | -0.078333 | 0.079010 | 0.035499 | -0.002342 |
| Latitude | -0.007685 | -0.070671 | -0.027699 | -0.019656 | 1.000000 | 0.008491 | -0.055101 | -0.007792 | -0.365601 | 0.041867 | -0.029020 | 0.077707 | 0.077591 | 0.002057 | 0.055816 | -0.054922 | 0.037806 | 0.012560 |
| Light_Conditions | 0.000165 | 0.042531 | 0.005400 | 0.025237 | 0.008491 | 1.000000 | 0.007916 | 0.010625 | -0.032699 | -0.013825 | 0.055153 | -0.025724 | -0.173002 | 0.022608 | 0.089403 | -0.108488 | -0.120927 | -0.004599 |
| Local_Authority_(District) | -0.000996 | -0.001454 | 0.000790 | -0.005521 | -0.055101 | 0.007916 | 1.000000 | -0.001687 | -0.009510 | 0.001880 | 0.007789 | 0.115593 | -0.003998 | -0.006231 | 0.041666 | -0.052353 | -0.000046 | -0.005372 |
| Carriageway_Hazards | 0.000569 | 0.008143 | 0.036481 | 0.003894 | -0.007792 | 0.010625 | -0.001687 | 1.000000 | 0.002684 | -0.003429 | 0.038238 | 0.006561 | -0.015999 | 0.015012 | -0.070474 | 0.069826 | -0.004238 | -0.000588 |
| Longitude | 0.009110 | -0.033068 | 0.045583 | 0.000240 | -0.365601 | -0.032699 | -0.009510 | 0.002684 | 1.000000 | -0.050665 | 0.000713 | 0.108034 | -0.055747 | -0.003080 | -0.046843 | 0.097935 | -0.043172 | -0.003716 |
| Number_of_Casualties | -0.009141 | 0.000306 | -0.042448 | -0.075685 | 0.041867 | -0.013825 | 0.001880 | -0.003429 | -0.050665 | 1.000000 | 0.234499 | 0.005492 | 0.036684 | -0.048326 | 0.137064 | -0.112428 | 0.006589 | -0.002334 |
| Number_of_Vehicles | 0.006287 | 0.034596 | 0.025797 | 0.077072 | -0.029020 | 0.055153 | 0.007789 | 0.038238 | 0.000713 | 0.234499 | 1.000000 | -0.015658 | -0.014061 | -0.096143 | 0.079861 | -0.035670 | -0.011944 | -0.003861 |
| Police_Force | -0.002062 | -0.105783 | 0.006225 | 0.003962 | 0.077707 | -0.025724 | 0.115593 | 0.006561 | 0.108034 | 0.005492 | -0.015658 | 1.000000 | 0.002484 | -0.026724 | -0.053232 | 0.093195 | -0.013471 | 0.001302 |
| Road_Surface_Conditions | -0.032542 | 0.008309 | -0.028606 | 0.015108 | 0.077591 | -0.173002 | -0.003998 | -0.015999 | -0.055747 | 0.036684 | -0.014061 | 0.002484 | 1.000000 | -0.009246 | 0.095619 | -0.094933 | 0.507601 | -0.001999 |
| Road_Type | 0.005644 | 0.101385 | 0.093675 | -0.014130 | 0.002057 | 0.022608 | -0.006231 | 0.015012 | -0.003080 | -0.048326 | -0.096143 | -0.026724 | -0.009246 | 1.000000 | -0.335244 | 0.087865 | -0.002802 | 0.002544 |
| Speed_limit | -0.020367 | 0.025817 | -0.135112 | -0.078333 | 0.055816 | 0.089403 | 0.041666 | -0.070474 | -0.046843 | 0.137064 | 0.079861 | -0.053232 | 0.095619 | -0.335244 | 1.000000 | -0.683529 | 0.031258 | -0.002081 |
| Urban_or_Rural_Area | 0.019713 | -0.071934 | 0.104192 | 0.079010 | -0.054922 | -0.108488 | -0.052353 | 0.069826 | 0.097935 | -0.112428 | -0.035670 | 0.093195 | -0.094933 | 0.087865 | -0.683529 | 1.000000 | -0.032542 | 0.005930 |
| Weather_Conditions | -0.021083 | -0.005122 | -0.014495 | 0.035499 | 0.037806 | -0.120927 | -0.000046 | -0.004238 | -0.043172 | 0.006589 | -0.011944 | -0.013471 | 0.507601 | -0.002802 | 0.031258 | -0.032542 | 1.000000 | -0.000585 |
| Vehicle_Type | -0.000137 | -0.006543 | 0.002393 | -0.002342 | 0.012560 | -0.004599 | -0.005372 | -0.000588 | -0.003716 | -0.002334 | -0.003861 | 0.001302 | -0.001999 | 0.002544 | -0.002081 | 0.005930 | -0.000585 | 1.000000 |
Heatmap
# Plotting the heatmap
plt.figure(figsize=(12, 12))
sns.heatmap(corr_matrix, annot=True, cmap='viridis', fmt='.2f', linewidth = 1 )
plt.title('Correlation Heatmap')
plt.show()
Insights and Analysis
a few insights that can be drawn based on the correlations between different variables:
Month and Weather_Conditions: There is a negative correlation of -0.02 between
MonthandWeather_Conditions, indicating slight seasonal patterns in weather conditions. This could imply that certain months might experience more adverse weather than others.Junction_Control and Road_Type: There is a moderate positive correlation of 0.1 between
Junction_ControlandRoad_Type. This suggests that the type of junction control (e.g., traffic signals, roundabouts) might be related to the type of road (e.g., single carriageway, dual carriageway).Accident_Severity and Number_of_Casualties: There is a moderate negative correlation of -0.08 between
Accident_SeverityandNumber_of_Casualties. This could indicate that accidents categorized as more severe (e.g., fatal) tend to involve fewer casualties overall, possibly due to the extreme nature of such accidents.Latitude and Longitude: There is a negative correlation of -0.37 between
LatitudeandLongitude. This is expected, as locations further north typically have higher latitudes and lower longitudes, and vice versa. It’s important to note that this correlation is likely due to the geographical distribution of accidents rather than a causal relationship.Speed_limit and Road_Type: There is a strong negative correlation of -0.68 between
Speed_limitandRoad_Type. This suggests that different types of roads (e.g., motorways vs urban roads) have varying speed limits, which is expected but underscores the importance of road type in determining speed limits.Road_Surface_Conditions and Weather_Conditions: There is a moderate positive correlation of 0.51 between
Road_Surface_ConditionsandWeather_Conditions. This indicates that adverse weather conditions (e.g., rain, snow) are associated with poorer road surface conditions (e.g., wet, icy).Urban_or_Rural_Area and Speed_limit: There is a moderate negative correlation of -0.68 between
Urban_or_Rural_AreaandSpeed_limit. This suggests that speed limits tend to be lower in urban areas compared to rural areas, possibly due to higher traffic density and safety considerations.
Conclusion
The analysis of road accident data for 2021-22 reveals significant insights into the factors influencing road safety, including weather conditions, road and junction types, geographical distribution, and speed limits. By understanding these correlations, policymakers and stakeholders can develop targeted strategies to mitigate road accidents and enhance safety. Additionally, the national and state-wise comparisons for India provide valuable information for identifying high-risk areas and implementing effective safety measures. Overall, a data-driven approach to road safety can help reduce accidents and save lives.